Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 7 de 7
Filter
Add more filters










Database
Language
Publication year range
1.
J Biomed Inform ; 152: 104617, 2024 04.
Article in English | MEDLINE | ID: mdl-38432534

ABSTRACT

OBJECTIVE: Machine learning methods hold the promise of leveraging available data and generating higher-quality data while alleviating the data collection burden on healthcare professionals. International Classification of Diseases (ICD) diagnoses data, collected globally for billing and epidemiological purposes, represents a valuable source of structured information. However, ICD coding is a challenging task. While numerous previous studies reported promising results in automatic ICD classification, they often describe input data specific model architectures, that are heterogeneously evaluated with different performance metrics and ICD code subsets. This study aims to explore the evaluation and construction of more effective Computer Assisted Coding (CAC) systems using generic approaches, focusing on the use of ICD hierarchy, medication data and a feed forward neural network architecture. METHODS: We conduct comprehensive experiments using the MIMIC-III clinical database, mapped to the OMOP data model. Our evaluations encompass various performance metrics, alongside investigations into multitask, hierarchical, and imbalanced learning for neural networks. RESULTS: We introduce a novel metric, , tailored to the ICD coding task, which offers interpretable insights for healthcare informatics practitioners, aiding them in assessing the quality of assisted coding systems. Our findings highlight that selectively cherry-picking ICD codes diminish retrieval performance without performance improvement over the selected subset. We show that optimizing for metrics such as NDCG and AUPRC outperforms traditional F1-based metrics in ranking performance. We observe that Neural Network training on different ICD levels simultaneously offers minor benefits for ranking and significant runtime gains. However, our models do not derive benefits from hierarchical or class imbalance correction techniques for ICD code retrieval. CONCLUSION: This study offers valuable insights for researchers and healthcare practitioners interested in developing and evaluating CAC systems. Using a straightforward sequential neural network model, we confirm that medical prescriptions are a rich data source for CAC systems, providing competitive retrieval capabilities for a fraction of the computational load compared to text-based models. Our study underscores the importance of metric selection and challenges existing practices related to ICD code sub-setting for model training and evaluation.


Subject(s)
Electronic Health Records , International Classification of Diseases , Humans , Neural Networks, Computer , Machine Learning , Computers , Clinical Coding/methods
2.
PLoS Comput Biol ; 15(3): e1006874, 2019 03.
Article in English | MEDLINE | ID: mdl-30830899

ABSTRACT

The T-cell (TCR) repertoire relies on the diversity of receptors composed of two chains, called α and ß, to recognize pathogens. Using results of high throughput sequencing and computational chain-pairing experiments of human TCR repertoires, we quantitively characterize the αß generation process. We estimate the probabilities of a rescue recombination of the ß chain on the second chromosome upon failure or success on the first chromosome. Unlike ß chains, α chains recombine simultaneously on both chromosomes, resulting in correlated statistics of the two genes which we predict using a mechanistic model. We find that ∼35% of cells express both α chains. Altogether, our statistical analysis gives a complete quantitative mechanistic picture that results in the observed correlations in the generative process. We learn that the probability to generate any TCRαß is lower than 10(-12) and estimate the generation diversity and sharing properties of the αß TCR repertoire.


Subject(s)
Receptors, Antigen, T-Cell, alpha-beta/biosynthesis , Chromosomes, Human , Humans , Probability , Receptors, Antigen, T-Cell, alpha-beta/genetics , Receptors, Antigen, T-Cell, alpha-beta/immunology , Recombination, Genetic
3.
Nat Commun ; 9(1): 561, 2018 02 08.
Article in English | MEDLINE | ID: mdl-29422654

ABSTRACT

High-throughput immune repertoire sequencing is promising to lead to new statistical diagnostic tools for medicine and biology. Successful implementations of these methods require a correct characterization, analysis, and interpretation of these data sets. We present IGoR (Inference and Generation Of Repertoires)-a comprehensive tool that takes B or T cell receptor sequence reads and quantitatively characterizes the statistics of receptor generation from both cDNA and gDNA. It probabilistically annotates sequences and its modular structure can be used to investigate models of increasing biological complexity for different organisms. For B cells, IGoR returns the hypermutation statistics, which we use to reveal co-localization of hypermutations along the sequence. We demonstrate that IGoR outperforms existing tools in accuracy and estimate the sample sizes needed for reliable repertoire characterization.


Subject(s)
B-Lymphocytes/immunology , Receptors, Antigen, B-Cell/genetics , Receptors, Antigen, T-Cell/genetics , Software , T-Lymphocytes/immunology , V(D)J Recombination , B-Lymphocytes/cytology , Base Sequence , Benchmarking , DNA, Complementary/genetics , DNA, Complementary/immunology , Gene Expression , High-Throughput Nucleotide Sequencing , Humans , Immunity, Innate , Molecular Sequence Annotation , Receptors, Antigen, B-Cell/immunology , Receptors, Antigen, T-Cell/immunology , T-Lymphocytes/cytology
4.
Phys Biol ; 15(5): 056001, 2018 05 16.
Article in English | MEDLINE | ID: mdl-29360100

ABSTRACT

Cells of the immune system are confronted with opposing pro- and anti-inflammatory signals. Dendritic cells (DC) integrate these cues to make informed decisions whether to initiate an immune response. Confronted with exogenous microbial stimuli, DC endogenously produce both anti- (IL-10) and pro-inflammatory (TNFα) cues whose joint integration controls the cell's final decision. Backed by experimental measurements we present a theoretical model to quantitatively describe the integration mode of these opposing signals. We propose a two step integration model that modulates the effect of the two types of signals: an initial bottleneck integrates both signals (IL-10 and TNFα), the output of which is later modulated by the anti-inflammatory signal. We show that the anti-inflammatory IL-10 signaling is long ranged, as opposed to the short-ranged pro-inflammatory TNFα signaling. The model suggests that the population averaging and modulation of the pro-inflammatory response by the anti-inflammatory signal is a safety guard against excessive immune responses.


Subject(s)
Dendritic Cells/immunology , Interleukin-10/immunology , Models, Immunological , Tumor Necrosis Factor-alpha/immunology , Computer Simulation , Dendritic Cells/cytology , Humans , Lipopolysaccharides/immunology , Paracrine Communication
5.
PLoS Comput Biol ; 13(7): e1005572, 2017 Jul.
Article in English | MEDLINE | ID: mdl-28683116

ABSTRACT

The diversity of T-cell receptors recognizing foreign pathogens is generated through a highly stochastic recombination process, making the independent production of the same sequence rare. Yet unrelated individuals do share receptors, which together constitute a "public" repertoire of abundant clonotypes. The TCR repertoire is initially formed prenatally, when the enzyme inserting random nucleotides is downregulated, producing a limited diversity subset. By statistically analyzing deep sequencing T-cell repertoire data from twins, unrelated individuals of various ages, and cord blood, we show that T-cell clones generated before birth persist and maintain high abundances in adult organisms for decades, slowly decaying with age. Our results suggest that large, low-diversity public clones are created during pre-natal life, and survive over long periods, providing the basis of the public repertoire.


Subject(s)
Aging/genetics , Gene Rearrangement, T-Lymphocyte/genetics , Genetic Variation/genetics , Receptors, Antigen, T-Cell/physiology , T-Cell Antigen Receptor Specificity/genetics , Twins, Monozygotic/genetics , Aging/immunology , Base Sequence , Cells, Cultured , Gene Expression Regulation, Developmental/genetics , Gene Expression Regulation, Developmental/immunology , Humans , Molecular Sequence Data , Recombination, Genetic
6.
Bioinformatics ; 32(13): 1943-51, 2016 07 01.
Article in English | MEDLINE | ID: mdl-27153709

ABSTRACT

MOTIVATION: The diversity of the immune repertoire is initially generated by random rearrangements of the receptor gene during early T and B cell development. Rearrangement scenarios are composed of random events-choices of gene templates, base pair deletions and insertions-described by probability distributions. Not all scenarios are equally likely, and the same receptor sequence may be obtained in several different ways. Quantifying the distribution of these rearrangements is an essential baseline for studying the immune system diversity. Inferring the properties of the distributions from receptor sequences is a computationally hard problem, requiring enumerating every possible scenario for every sampled receptor sequence. RESULTS: We present a Hidden Markov model, which accounts for all plausible scenarios that can generate the receptor sequences. We developed and implemented a method based on the Baum-Welch algorithm that can efficiently infer the parameters for the different events of the rearrangement process. We tested our software tool on sequence data for both the alpha and beta chains of the T cell receptor. To test the validity of our algorithm, we also generated synthetic sequences produced by a known model, and confirmed that its parameters could be accurately inferred back from the sequences. The inferred model can be used to generate synthetic sequences, to calculate the probability of generation of any receptor sequence, as well as the theoretical diversity of the repertoire. We estimate this diversity to be [Formula: see text] for human T cells. The model gives a baseline to investigate the selection and dynamics of immune repertoires. AVAILABILITY AND IMPLEMENTATION: Source code and sample sequence files are available at https://bitbucket.org/yuvalel/repgenhmm/downloads CONTACT: elhanati@lpt.ens.fr or tmora@lps.ens.fr or awalczak@lpt.ens.fr.


Subject(s)
Gene Rearrangement, T-Lymphocyte , Receptors, Antigen, T-Cell/genetics , Software , V(D)J Recombination , Algorithms , Humans , Probability , Sequence Alignment
7.
Philos Trans R Soc Lond B Biol Sci ; 370(1676)2015 Sep 05.
Article in English | MEDLINE | ID: mdl-26194757

ABSTRACT

We quantify the VDJ recombination and somatic hypermutation processes in human B cells using probabilistic inference methods on high-throughput DNA sequence repertoires of human B-cell receptor heavy chains. Our analysis captures the statistical properties of the naive repertoire, first after its initial generation via VDJ recombination and then after selection for functionality. We also infer statistical properties of the somatic hypermutation machinery (exclusive of subsequent effects of selection). Our main results are the following: the B-cell repertoire is substantially more diverse than T-cell repertoires, owing to longer junctional insertions; sequences that pass initial selection are distinguished by having a higher probability of being generated in a VDJ recombination event; somatic hypermutations have a non-uniform distribution along the V gene that is well explained by an independent site model for the sequence context around the hypermutation site.


Subject(s)
Antibody Diversity , B-Lymphocytes/immunology , Algorithms , Clonal Selection, Antigen-Mediated , Humans , Models, Genetic , Models, Immunological , Receptors, Antigen, B-Cell/genetics , Somatic Hypermutation, Immunoglobulin , V(D)J Recombination
SELECTION OF CITATIONS
SEARCH DETAIL
...